DISTANCE FUNCTIONS AND ATTRIBUTE WEIGHTING IN A k-NEAREST NEIGHBORS CLASSIFIER WITH AN ECOLOGICAL APPLICATION∗

نویسندگان

  • Alyssa C. Frazee
  • Matthew A. Hathcock
  • Samantha C. Bates Prins
چکیده

To assess environmental health of a stream, field, or other ecological object, characteristics of that object should be compared to a set of reference objects known to be healthy. Using streams as objects, we propose a k-nearest neighbors algorithm (Bates Prins and Smith, 2006) to find the appropriate set of reference streams to use as a comparison set for any given test stream. Previously, investigations of the k-nearest neighbors algorithm have utilized a variety of distance functions, the best of which has been the Interpolated Value Difference Metric (IVDM), proposed by Wilson and Martinez (1997). We propose two alternatives to the IVDM: Wilson and Martinez’s Windowed Value Difference Metric (WVDM) and the Density-Based Value Difference Metric (DBVDM) developed by Wojna (2005). We extend the WVDM and DBVDM to handle continuous response variables and compare these distance measures to the IVDM within the ecological k-nearest neighbors context. Additionally, we compare two existing attribute weighting schemes (Wojna 2005) when applied to the IVDM, WVDM, and DBVDM, and we propose a new attribute weighting method for use with these distance functions as well. In assessing environmental impairment, the WVDM and DBVDM were slight improvements over the IVDM. Attribute weighting also increased the effectiveness of the k-nearest neighbors algorithm in this ecological setting. ∗This research was supported by NSF grant NSF-DMS 0552577 and was conducted during an 8-week summer research experience for undergraduates (REU).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Class-Based Attribute Weighting for Time Series Classification

In this paper, we present two novel class-based weighting methods for the Euclidean nearest neighbor algorithm and compare them with global weighting methods considering empirical results on a widely accepted time series classification benchmark dataset. Our methods provide higher accuracy than every global weighting in nearly half of the cases and they have better overall performance. We concl...

متن کامل

Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients

MOTIVATIONS AND RESULTS For classifying gene expression profiles or other types of medical data, simple rules are preferable to non-linear distance or kernel functions. This is because rules may help us understand more about the application in addition to performing an accurate classification. In this paper, we discover novel rules that describe the gene expression profiles of more than six sub...

متن کامل

Adaptive Boosting for Spatial Functions with Unstable Driving Attributes

Combining multiple global models (e.g. back-propagation based neural networks) is an effective technique for improving classification accuracy by reducing a variance through manipulating training data distributions. Standard combining methods do not improve local classifiers (e.g. k-nearest neighbors) due to their low sensitivity to data perturbation. Here, we propose an adaptive attribute boos...

متن کامل

A case-based reasoning model that uses preference theory functions for credit scoring

0957-4174/$ see front matter 2012 Elsevier Ltd. A doi:10.1016/j.eswa.2012.01.181 ⇑ Corresponding author. E-mail addresses: [email protected] (S. Vu g.ac.rs (B. Delibasic), [email protected] (A. Uzelac) (M. Suknovic). We propose a case-based reasoning (CBR) model that uses preference theory functions for similarity measurements between cases. As it is hard to select the right preferen...

متن کامل

Role of Heuristic Methods with variable Lengths In ANFIS Networks Optimum Design and Training

ANFIS systems have been much considered due to their acceptable performance in terms of creation of fuzzy classifier and training. One main challenge in designing an ANFIS system is to achieve an efficient method with high accuracy and appropriate interpreting capability. Undoubtedly, type and location of membership functions and the way an ANFIS network is trained are of considerable effect on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010